Ch02 - Data Models and Query Languages

Chapter 2 - data models and query languages

Roots of RDMS: business data processing - mainframe in the 1960s and ’70s.

transaction processing (entering sales or banking transactions, airline reservations, stock-keeping in warehouses)
batch processing (customer invoicing, payroll, reporting )

For a data structure like a résumé, which is mostly a self-contained document, a JSON representation can be quite appropriate
Document: better locality
one-to-many relationships imply tree structure- captured well by JSON
Many-to one relationships - e.g. normalizing city of people
- not easy with doc data base
- Emulate join in application code rather than database
Many to many relationships
Not a new debate:
- Hierarchical model
- Network model

Document: schema flexibility, performance due to locality, sometimes closer to data structures used by application
Which data model leads to simpler application code?
- Document: If the data in your application has a document-like structure (i.e., a tree of one-to- many relationships, where typically the entire tree is loaded at once
- highly interconnected data : document is awkward, relational acceptable, graph most natural
Schema
- Document databases - schema-on-read - like dynamic type checking
  - advantageous if the items in the collection don’t all have the same structure for some reason
- Relational - schema-on-write. - like static/compile time type checking
  - most relational database systems execute the ALTER TABLE statement in a few milliseconds except MySQL
Data locality for queries
- Document is usually stored as single continuous string. So if often eed access to entire doc, performance advantage
- Not limited to document db:
  - locality property in a relational model
    - Spanner: rows nested in parent
    - Oracle: multi-table index cluster table
    - Bigtable, Cassandra and HBase : column family

Historically, data started out being represented as one big tree (the hierarchical model), but that wasn’t good for representing many-to-many relationships, so the relational model was invented to solve that problem
Recently noSQL:
- Document - when relationships across docs is rare
- Graph - when anything is related to anything